Soft Margins for Adaboost Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
نویسنده
چکیده
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overrtting. This paper shows that although AdaBoost rarely overrts in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we nd that AdaBoost achieves { doing gradient descent in an error function with respect to the margin { asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a sub-optimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a diicult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin { a concept known from Support Vector learning. In particular we suggest (1) regularized AdaBoost Reg using the soft margin directly in a modiied loss function and (2) regular-ized linear and quadratic programming (LP/QP-) AdaBoost, where the soft margin is attained by introducing slack variables. Extensive simulations demonstrate that the proposed regularized Ada-Boost-type algorithms are useful and competitive for noisy data.
منابع مشابه
Data-dependent Structural Risk Minimisation for Perceptron Decision Trees Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Perceptron Decision Trees (also known as Linear Machine DTs, etc.) are analysed in order that data-dependent Structural Risk Minimization can be applied. Data-dependent analysis is performed which indicates that choosing the maximal margin hyperplanes at the decision nodes will improve the generalization. The analysis uses a novel technique to bound the generalization error in terms of the marg...
متن کاملDiscrete versus Analog Computation: Aspects of Studying the Same Problem in Diierent Computational Models Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
In this tutorial we want to outline some of the features coming up when analyzing the same computational problems in diierent complexity theoretic frameworks. We will focus on two problems; the rst related to mathematical optimization and the second dealing with the intrinsic structure of complexity classes. Both examples serve well for working out in how far diierent approaches to the same pro...
متن کاملMultiplicative Updatings for Support-vector Learning Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Support Vector machines nd maximal margin hyperplanes in a high dimensional feature space. Theoretical results exist which guarantee a high generalization performance when the margin is large or when the number of support vectors is small. Multiplicative-Updating algorithms are a new tool for perceptron learning whose theoretical properties are well studied. In this work we present a Multiplica...
متن کاملDynamically Adapting Kernels in Support Vector Machines Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
The kernel-parameter is one of the few tunable parameters in Support Vector machines, and controls the complexity of the resulting hypothesis. The choice of its value amounts to model selection, and is usually performed by means of a validation set. We present an algorithm which can automatically perform model selection and learning with no additional computational cost and with no need of a va...
متن کاملLatent Semantic Kernels for Feature Selection Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document co-occurrences. The paper demonstrates how this method can be implemented implicitly to a kernel deened feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with...
متن کامل